GBE 



Genes and Junk in Plant Mitochondria — Repair Mechanisms 
and Selection 

Alan C. Christensen* 

School of Biological Sciences, University of Nebraska-Lincoln 
Corresponding author: E-mail: achristensen2@unl.edu. 
Accepted: May 27, 2014 

Abstract 

Plant mitochondrial genomes have very low mutation rates. In contrast, they also rearrange and expand frequently. This is easily 
understood if DNA repair in genes is accomplished by accurate mechanisms, whereas less accurate mechanisms including nonho- 
mologous end joining or break-induced replication are used in nongenes. An important question is how different mechanisms of 
repair predominate in coding and noncoding DNA, although one possible mechanism is transcription-coupled repair (TCR). This work 
tests the predictions of TCR and finds no support for it. Examination of the mutation spectra and rates in genes and junk reveals what 
DNA repair mechanisms are available to plant mitochondria, and what selective forces act on the repair products. A model is proposed 
that mismatches and other DNA damages are repaired by converting them into double-strand breaks (DSBs). These can then be 
repaired by any of the DSB repair mechanisms, both accurate and inaccurate. Natural selection will eliminate coding regions repaired 
by inaccurate mechanisms, accounting for the low mutation rates in genes, whereas mutations, rearrangements, and expansions 
generated by inaccurate repair in noncoding regions will persist. Support for this model includes the structure of the mitochondrial 
mutS homolog in plants, which is fused to a double-strand endonuclease. The model proposes that plant mitochondria do not 
distinguish a damaged or mismatched DNA strand from the undamaged strand, they simply cut both strands and perform homology- 
based DSB repair. This plant-specific strategy for protecting future generations from mitochondrial DNA damage has the side effect of 
genome expansions and rearrangements. 
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Introduction 

Plant mitochondrial genomes have followed different evolu- 
tionary trajectories from their counterparts in animals and 
fungi. The genomes are very large (up to 1 1 Mb) but still 
have only 30-60 genes, thus most of the DNA is noncoding. 
The mutation rate measured in protein-coding regions and 
rRNA regions is very low, but the genomes are subject to 
major rearrangements and expansions (Palmer and Herbon 
1988). The mutational burden hypothesis was proposed as 
an explanation for the paradox of low mutation rates and 
high expansion rates (Lynch et al. 2006; Lynch 2007), but 
exceptional species with both high mutation rates and high 
expansion rates have been found that defy this explanation 
(Cho et al. 2004; Parkinson et al. 2005; Sloan, Muller, et al. 
2012; Sloan et al. 2012). After comparing the mitochondrial 
noncoding sequences of two Arabidopsis thaliana ecotypes 
that had been diverged for approximately 200,000 years, 
I proposed that coding and noncoding DNAs are repaired by 



different mechanisms and thus have different mutation rates 
and spectra (Christensen 2013). Although coding regions are 
highly conserved, noncoding DNA has diverged so rapidly that 
over 200 kb of the A. thaliana mitochondrial genome is not 
alignable with any sequences outside the Brassicales family of 
plants, suggesting that it is nonfunctional junk (Brenner 1 998; 
Christensen 2013). This also explains why noncoding DNA has 
not previously been used in mutational or phylogenetic stud- 
ies — it evolves too quickly to be useful over evolutionary time 
scales. The model proposes that coding regions are repaired 
very accurately, likely by homologous recombination or gene 
conversion. Noncoding regions are repaired by inaccurate 
mechanisms of double-strand break (DSB) repair that produce 
rearrangements, chimeric genes, and genome expansion 
(Davila et al. 201 1). Because there is no mechanism available 
for precisely removing junk DNA, it accumulates by Muller's 
ratchet (Muller 1964). The common feature in both coding 
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and noncoding DNA is DSB repair, leading either to homol- 
ogy-based accurate repair or to inaccurate repair with dupli- 
cations expanding the genome. 

Although this model explains the observed features of mi- 
tochondrial genomes, how the coding and noncoding DNA 
have such distinctly different mutation rates and spectra is still 
a mystery. One possible explanation is that the primary mech- 
anisms of DNA repair are different in genes and in junk, and 
the only plausible mechanism for this is transcription-coupled 
repair (TCR) (Ganesan et al. 2012; Vermeulen and Fousteri 
2013; Howan et al. 2014). The existence of cotranscribed 
genes in plant mitochondria provides an opportunity to test 
this hypothesis. In this work, I find the hypothesis of TCR to be 
unlikely and suggest a model for how mitochondrial genomes 
are repaired differently in genes and in junk. 

Results 

The hypothesis of TCR can be tested by examining both 
coding and noncoding transcribed regions, for example, the 
protein-coding regions and intergenic regions of cotranscribed 
genes. The model predicts that the mutation rate in the 
coding regions should be equal to the mutation rate in the 
intergenic regions. In A. thaliana, there are four gene clusters 
shown to be cotranscribed: nad4L-atp4, rpl5-cob, nad3- 
rps12, and rps3-rpl16 (Hoffmann et al. 1999; Forner et al. 
2007) and these same clusters are observed in a wide variety 
of angiosperms (Richardson et al. 2013). The lengths of the 
intergenic regions in these transcripts in A. thaliana are 
266 bp, 1 .9 kbp, 45 bp, and Obp (rps3 and rpl16 overlap by 
1 34 bp), respectively (Davila et al. 2011). Because selection 
might be acting near the translation start and stop sites, the 
two larger intergenic regions are most suitable as a test of the 
hypothesis. In several species including A. thaliana, there is an 
rps14 pseudogene between rpl5 and cob (Aubert et al. 1992; 
Quinones et al. 1996; Figueroa et al. 1999; Ong and Palmer 
2006). Because in some species rpsi4 is a functional gene and 
in others it is a pseudogene in the intergenic region, several 
species were chosen for analysis all of which have a functional 
rps14 gene. The rp/5 and rps14 genes are just a few nucleo- 
tides apart, so only the rps14-cob intergenic region was used. 
The species chosen were all legumes with completely se- 
quenced mitochondrial genomes containing single copies of 
the nad4L-atp4 and rpl5-rps14-cob clusters. Four legumes 
were chosen: The mung bean (Vigna radiata), the azuki 
bean (Vigna angularis), the pongam tree (Millettia pinnata), 
and the fava bean (Vicia faba). Carica papaya was chosen as 
outgroup (fig. 1). 

The five coding regions, nad4L, atp4, rpl5, rpsl4, and cob, 
were aligned (supplementary fig. S1, Supplementary Material 
online), and the synonymous substitutions per synonymous 
site were measured using the concatenation of all five. The 
genes of plant mitochondria also show extensive RNA editing 
(Barkan and Small 2014). The edited sites were confirmed and 



annotated in the M. pinnata genome (Kazakoff et al. 2012). 
All of these are C to U edits in the mRNA and most change the 
amino acid encoded. This alters the definitions of synonymous 
and nonsynonymous sites for two reasons. If an edit of a C to 
a U in the mRNA changes the amino acid codon, then a mu- 
tation in the genome at that site from a C to a T will be a 
synonymous change, but standard methods will count that 
position as a nonsynonymous site. Several examples are in this 
data set. Of the 48 edited cytosines in these 5 genes, all are 
conserved within the legumes, but 12 of those edited sites 
have mutated to T in C. papaya (see supplementary fig. S1, 
Supplementary Material online). Of those sites, 1 1 would be 
classified as nonsynonymous substitutions, but the editing in 
the legumes means that the differences in C. papaya are ac- 
tually synonymous substitutions. Furthermore, the pentatrico- 
peptide repeat (PPR) proteins that mediate editing recognize 
the RNA sequence upstream of the edit (Barkan and Small 
2014), so changes in these positions will all be nonsynon- 
ymous if they affect editing efficiency, even if the amino 
acid sequence at the site of the mutation does not change. 
For this reason, the analysis was done twice: Once using the 
entire coding regions and again with any edited codons and 
the six preceding codons removed from the alignment. The 
intergenic regions between nad4L and atp4 and between 
rps14 and cob were also aligned (supplementary fig. S2, 
Supplementary Material online), and the mutation rate was 
determined using the concatenation of both alignments, in- 
cluding both transitions and transversions, but not indels. The 
rates are shown in table 1 and graphed in figure 2. The sub- 
stitution rate in the intergenic region is higher than the syn- 
onymous substitution rate in the complete coding sequences. 
When edited sites are removed, the substitution rate in the 
intergenic sequence is still higher than in coding sequences 
but not statistically significant in most cases. 

However, the substitution rate is only a small part of the 
story. The alignments also reveal frequent nucleotide losses 
and gains in the intergenic regions (particularly just upstream 
of cob). The intergenic regions have mutated much more ex- 
tensively than the coding regions when indels are taken into 
account. As shown previously, most of the intergenic regions 
in plant mitochondria cannot even be aligned except between 
very closely related species (Christensen 2013). Without the 
flanking coding regions of rps14 and cob, the intergenic 
region between them cannot be accurately aligned using 
these five species. 

If TCR is the mechanism of repair in plant mitochondria, 
then the mutation rate in a transcribed intergenic region 
should be the same as the neutral mutation rate measured 
by synonymous substitutions in the coding regions of the 
same transcripts, and the frequency of indels in the intergenic 
regions should be low, as in the coding regions. Indels in the 
coding regions are rare and are always in-frame, whereas in 
the intergenic regions, there are more indels per nucleotide; 
therefore, the hypothesis of TCR is most likely incorrect. 
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C. papaya 

I V. faba 

I M. pinnatta 

I V. angularis 

' V. radiata 

Fig. 1. — Phylogenetic relationships of the species studied. Tree showing the relationships between the four legumes used in this study and the outgroup 
Carica papaya. Based on Soltis et al. (201 1). 



Table 1 

Mutation Rates in Coding and Intergenic Regions 



Species 1 


Species 2 


CDS 


CDS - edits 


Intergenic 


Carica papaya 


Vicia faba 


0.0735 ±0.0107 


0.0826 ±0.01 35 


0.1009 ±0.0092 


C. papaya 


Millettia pinnata 


0.0570 ± 0.0093 


0.0636 ±0.01 14 


0.0873 ± 0.0095 


C. papaya 


Vigna angularis 


0.0543 ± 0.0084 


0.0677 ±0.01 17 


0.0860 ± 0.0091 


C. papaya 


V. radiata 


0.0543 ± 0.0085 


0.0677 ±0.01 17 


0.0896 ± 0.0094 


Vic. faba 


M. pinnata 


0.0285 ± 0.0064 


0.0280 ±0.0080 


0.041 1 ± 0.0064 


Vic. faba 


V. angularis 


0.031 3 ±0.0074 


0.0411 ±0.0097 


0.0477 ± 0.0075 


Vic. faba 


V. radiata 


0.0323 ± 0.0074 


0.0424 ±0.0097 


0.0491 ±0.0078 


M. pinnata 


V. angularis 


0.0128 ±0.0041 


0.0189 ±0.0063 


0.0209 ± 0.0040 


M. pinnata 


V. radiata 


0.0128 ±0.0041 


0.0189 ±0.0063 


0.0264 ±0.0054 


V. angularis 


V. radiata 


0.0000 ± 0.0000 


0.0000 ±0.0000 


0.0102 ±0.0032 



Note. — Synonymous substitution rates in the coding sequences (CDS), coding sequences with edited regions removed (CDS -edits), and intergenic regions are shown 
(± standard errors). Analyses were conducted using the Kumar model (Nei and Kumar 2000). The analysis involved 5nt sequences. All positions containing gaps and missing 
data were eliminated. There were a total of 967 positions in the CDS data set, 712 positions in the CDS — edits data set, and 1,620 positions in the intergenic data set. Of 
these positions in the CDS data set, there were 51 variants within the 4 legumes, including 20 synonymous substitutions, 26 nonsynonymous substitutions, and 5 in-frame 
indels. 




■ CDS 

■ CDS -edits 

■ intergenic 
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/ / / / / f J / / / 



Species compared 



Fig. 2. — Mutation rates in coding regions (CDS) and noncoding regions. Synonymous substitution rates in the CDS of nad4L, atp4, rpl5, rps14, and cob 
and the coding regions without the edited regions (CDS - edits) were calculated as described in the text. Substitution rates in the intergenic regions between 
nad4L and atp4 and between rps14 and cob were also calculated as described in the text. Standard errors are shown. 
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Discussion 

How do plant mitochondrial genomes and their repair systems 
produce genes with very low synonymous substitution rates, 
but intergenic regions with high substitution, indel, genome 
expansion, and rearrangement rates? One possibility was a 
different DNA repair pathway in genes and in junk, but the 
only plausible mechanism is TCR, which can be ruled out. The 
explanation must therefore be a combination of the available 
DNA repair pathways and selection on the DNA postrepair. 
Plant mitochondria have a short-patch base-excision repair 
system, at least for removal of uracil (Boesch et al. 2009), 
but there is no evidence for long-patch base-excision repair 
or nucleotide-excision repair (Gualberto et al. 201 3). Genome 
evolution and the rearrangements seen in mutants suggest 
that DSB repair is an important process in plant mitochondria 
(Shedge et al. 2007; Arrieta-Montiel et al. 2009; Davila et al. 
2011; Janicka et al. 2012; Miller-Messmer et al. 2012; 
Christensen 2013). DSB repair has multiple modalities that 
can produce either very accurate or inaccurate repair. One 
pathway, break-induced replication (BIR), can also result in 
large duplications, particularly if the break invades another 
DNA molecule at a homeologous site (Llorente et al. 2008; 
Cappadocia et al. 2010). 

Other than short-patch base-excision repair, little is known 
about DNA repair proteins in mitochondria, except for the 
MSH1 protein, a mitochondrially targeted homolog of mis- 
match repair proteins. It has been suggested that the MSH1 
protein plays a role in homology surveillance during DSB repair 
(Abdelnoor et al. 2003; Shedge et al. 2007; Arrieta-Montiel 
etal. 2009; Davila et al. 201 1). Nuclear and bacterial mismatch 
repair systems include a strand-discrimination mechanism that 
directs endonuclease cleavage and repair to the newly synthe- 
sized DNA strand (Kunkel and Erie 2005; Ghodgaonkar et al. 
2013). Homologs of the strand-discrimination components 
have not been identified in plant organelles; however, the 
MSH1 protein of higher plants is fused directly to an endonu- 
clease domain (Abdelnoor et al. 2006). Sequence comparisons 
and modeling showed that the endonuclease domain is similar 
to the GIY-YIG homing endonuclease I-Tevl, which makes 
DSBs as a monomer (Mueller et al. 1995; Kleinstiver et al. 
2013). This suggests a model for DNA repair in plant mito- 
chondria of lesion recognition followed by double-strand 
breakage, catalyzed by MSH1 and other unknown nucleases. 
A DSB eliminates the need for a strand-discrimination system 
but requires a template. 

If DNA damage (other than what can be repaired by short- 
patch base-excision repair, such as deaminated cytosine) is 
converted into DSBs, and these breaks are then processed 
by DSB repair mechanisms, there are a number of possible 
outcomes. Alternative pathways for processing the DSB will 
depend on whether a template molecule is available and 
whether the second broken end is captured by the repair 
event. If the two DNA ends are coordinated, nonhomologous 



end joining can be very accurate, but otherwise it can lead to 
chimeric gene formation and duplications. BIR at a homolo- 
gous region may lead to large duplications and can also shift 
the stoichiometry of different parts of the genome. BIR at a 
short region of homology (such as the 50-500 bp repeats) will 
lead to rearrangements and genome expansion; BIR at micro- 
homologies of a few nucleotides can also produce chimeric 
genes. Homologous recombination or gene conversion will 
accurately repair the DSBs. The question still remains of how 
coding sequences are repaired so accurately while the 
noncoding regions experience rapid change. 

The most likely explanation is that both types of DSB repair 
occur in all parts of the genome, but selection determines 
which outcomes we can observe (fig. 3). DSB repair can 
occur in either coding or noncoding DNA and can either be 
accurate or inaccurate. In noncoding DNA, accurate repair 
presumably occurs but is impossible to observe in alignments. 
Inaccurate repair leads to expansions, mutations, and rearran- 
gements, which are observed. In coding DNA, mitochondria 
with inaccurately repaired essential genes may be eliminated 
from the cell, or not inherited, thus what we observe in coding 
DNA is repair that maintains gene function, explaining the low 
synonymous substitution and indel rate. Accurate, homology- 
based repair such as gene conversion can explain the obser- 
vations in coding sequences. If a template is not available 
within a mitochondrion, mitochondrial fusion could occur to 
make a template DNA molecule available. This model, that 
most DNA repair is mediated via generating DSBs followed 
by the DSB repair pathways and selection for functional mito- 
chondria within a cell, can explain the evolution of plant mi- 
tochondrial genomes. 

An interesting additional question is why natural selection 
has favored this mechanism of DNA repair in plant mitochon- 
dria but not in animal mitochondria or the nucleus. Recent 
work showed that in animals the female germline sequesters a 
subset of mitochondria that are relatively inactive in producing 
reactive oxygen species and other DNA damaging agents, to 
minimize transmission of mitochondrial mutations (de Paula 
et al. 2013). Both plants and animals need to avoid the inher- 
itance of accumulated mitochondrial mutations and appear to 
use different mechanisms to accomplish that. Plants do not 
have the luxury of specifying a germline, so converting 
damage into DSBs followed by accurate template-directed 
repair ensures that the genes will be faithfully inherited. The 
side effect of using DSB repair for nearly every type of damage 
is genome expansion and accumulation of chimeric genes, but 
the benefit of accurate transmission of mitochondrial genes to 
the next generation must outweigh the relatively minor cost of 
replicating a large mitochondrial genome. Finally, the 
mutational burden hypothesis does not appear to apply to 
plant mitochondria. In addition to mutations in the junk 
DNA apparently being mostly neutral, the specific repair 
mechanisms available do not lead to an inverse correlation 
between mutation rate and genome size. This model further 
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Homologous recombination 
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very large repeat 



Gene conversion 



I 



Accurate repair. No mutation, 
rearrangement or expansion. 



Short homology-based repair 



Homologous recombination 
with intermediate repeat 



Break-induced replication 



i 



Accurate repair. Genome 
expansion and rearrangements. 
Mutations in essential genes 
eliminated by selection. 



Non-homologous repair 



Non-homologous 
end-joining 



Microhomology-mediated 
break-induced replication 



I 



Inaccurate repair. Genome 
expansion and rearrangements. 
Chimeric gene production. 
Mutations in essential genes 
eliminated by selection. 



Fig. 3. — Model for mitochondrial DNA repair explaining differences between genes and junk. The diagram shows the fate of DSBs. These can be 
repaired by nonhomologous or template-based repair, and a template can either be a sister DNA molecule or be a short stretch of identity in a different 
context in the same or a different DNA molecule. 



predicts that if mechanisms such as base-excision repair or 
mismatch repair are less effective or transiently lost in a line- 
age, DSB repair will produce genome expansions at the same 
time as base substitution rates increase. This also predicts a 
loss of editing sites and can explain the counterintuitive pos- 
itive correlation between mutation rates and genome expan- 
sions in plant mitochondria. 



Materials and Methods 

Complete mitochondrial genome sequences used were acces- 
sions KC 189947 for V. faba (Negruk 2013), JN872550 for M. 
pinnata (Kazakoff et al. 2012), AP012599 for V. angularis 
(Naito et al. 2013), HM367685 for V. radiata (Alverson et al. 
2011), and EU431224 for C. papaya (Ming et al. 2008). 
Sequence manipulation to extract the specific genes and inter- 
genic regions studied was done using the VectorNTI 1 1.5.0 
package from Invitrogen. 

Alignments were done using MUSCLE (Edgar 2004) as im- 
plemented in MEGA6 (Tamura et al. 2013). Alignments were 
prepared for figures using Jalview (Waterhouse et al. 2009). 
Synonymous substitution rates and standard error estimates 
were calculated by MEGA6, using the Kumar model (Nei and 
Kumar 2000) with all ambiguous positions removed for each 
sequence pair. Substitution rates in noncoding regions were 
calculated by MEGA6, using Kimura's two-parameter model 
(Kimura 1980), including both transitions and transversions, 
with all ambiguous positions removed for each sequence pair. 



Supplementary Material 

Supplementary figures S1 and S2 are available at Genome 
Biology and Evolution online (http://wvwv.gbe.oxfordjournals. 
org/). 
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